[bgp/agg]: Add BGP aggregate address test cases for Config Persistence and Recovery #23347
[bgp/agg]: Add BGP aggregate address test cases for Config Persistence and Recovery #23347shixizhang merged 4 commits intosonic-net:masterfrom
Conversation
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
f123c6f to
e967b7f
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
e967b7f to
27700c7
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Co-authored-by: Copilot <[email protected]> Signed-off-by: Shixi Zhang <[email protected]>
27700c7 to
391c099
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
489d455 to
759fc61
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
… patch On KVM/VS the warm-reboot script has a 1s timeout for docker-exec health checks that is too tight, causing fpmsyncd and orchagent to crash during warm reboot. This is the same issue that AdvancedReboot handles by patching the timeout to 5s (tests/common/fixtures/advanced_reboot.py). Apply the same sed patch to the warm-reboot script before rebooting, and restore the original safe_reboot=True call with full recovery checks. Co-authored-by: Copilot <[email protected]> Signed-off-by: Shixi Zhang <[email protected]>
759fc61 to
48ec752
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
StormLiangMS
left a comment
There was a problem hiding this comment.
Hi @shixizhang, nice test cases! One issue:
\pytest.mark.device_type('vs')\ contradicts physical test verification - The PR description says tests were validated on a physical m1-48 testbed, but \device_type('vs')\ restricts execution to VS devices when \--device_type\ is specified. Please use \pytest.mark.device_type('physical')\ or remove the marker entirely.
Also a minor concern: TC5.2/TC5.3 do \sudo config save -y\ before cleanup. If the test fails after save but before teardown, the aggregate entry persists across reboots. Please ensure the \setup_teardown\ fixture handles rollback of saved config.
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
- Remove device_type('vs') marker that contradicts physical testbed validation
- Add local setup_teardown fixture with config save after rollback to prevent
aggregate entries from persisting in on-disk config across reboots
Co-Authored-By: Claude Opus 4.6 <[email protected]>
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
addressed the issue as new iteration. |
- Remove device_type('vs') marker that contradicts physical testbed validation
- Add local setup_teardown fixture with config save after rollback to prevent
aggregate entries from persisting in on-disk config across reboots
Co-authored-by: Copilot <[email protected]>
6665622 to
427f94c
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
- Remove device_type('vs') marker that contradicts physical testbed validation
- Add local setup_teardown fixture with config save after rollback to prevent
aggregate entries from persisting in on-disk config across reboots
Co-authored-by: Copilot <[email protected]>
Signed-off-by: Shixi Zhang <[email protected]>
427f94c to
9d96cec
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
- Remove device_type('vs') marker that contradicts physical testbed validation
- Add local setup_teardown fixture with config save after rollback to prevent
aggregate entries from persisting in on-disk config across reboots
Co-authored-by: Copilot <[email protected]>
Signed-off-by: Shixi Zhang <[email protected]>
9d96cec to
a343dab
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
- Sync is_virtual_platform(duthost) fix for warm reboot KVM check - Change topology mark from (t1, m1) to m1 only Co-authored-by: Copilot <[email protected]> Signed-off-by: Shixi Zhang <[email protected]>
132d14e to
2babda5
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@StormLiangMS Thanks for the review! Both issues have been addressed: device_type('vs') removed — Topology is now restricted to m1 only, no device_type marker. |
StormLiangMS
left a comment
There was a problem hiding this comment.
@shixizhang — Approved ✅
All prior review comments addressed in commits a343dab and 2babda5:
- ✅ \device_type('vs')\ removed — replaced with \is_virtual_platform(duthost)\ runtime check in TC 5.5 only (correct approach)
- ✅ *Topology narrowed to \m1* — appropriate for aggregate-address tests that need real multi-ASIC or physical topology
- ✅ \setup_teardown\ fixture now does \config save -y\ after rollback — prevents stale aggregate entries from persisting in on-disk config across reboots
Code quality observations:
- Clean reuse of helpers from \ est_bgp_aggregate_address.py\ — no duplication
- Good pre/post disruption verification strategy (CONFIG_DB only pre, full stack post)
- \wait_for_aggregate_state()\ helper with polling handles the async bgpcfgd race correctly
- All 5 test cases have proper try/finally cleanup with graceful fallback to checkpoint rollback
- TC 5.4 (BBR) properly saves and restores BBR default state
- Warm reboot VS timeout patch mirrors the established pattern from \�dvanced_reboot.py\
CI: All required checks passing. The only failure is the OPTIONAL t1-lag-vpp test — unrelated.
LGTM — ready to merge.
Description of PR
Summary:
Add new test file
test_bgp_aggregate_address_resilience.py(Test Group 5) that validates BGP aggregate-address configuration persistence and recovery across various disruption scenarios. These 5 new test cases verify that aggregate address configuration written via GCU survives BGP container restarts, config reloads, cold reboots, warm reboots, and BBR state transitions.New test cases:
test_aggregate_persists_bgp_container_restart: Aggregate config survives BGP container restart; CONFIG_DB + STATE_DB + FRR are consistent after recovery.test_aggregate_persists_config_reload: Aggregate config (with summary-only=true) survives config save + config reload.test_aggregate_persists_config_save_and_reboot: IPv6 aggregate config survives config save + cold reboot.test_aggregate_bbr_required_inactive_persists_bgp_restart: BBR-required aggregate stays inactive after BGP restart when BBR is disabled; activates once BBR is enabled.test_aggregate_persists_warm_reboot: Aggregate config survives warm reboot.Type of change
Back port request
Approach
What is the motivation for this PR?
Existing BGP aggregate-address tests cover configuration validation and route propagation behavior, but there are no tests verifying that aggregate address configuration persists across operational disruptions such as BGP container restarts, config reloads, and device reboots. This PR fills that gap by adding resilience tests that validate CONFIG_DB, STATE_DB, and FRR running-config consistency after each disruption type.
How did you do it?
test_bgp_aggregate_address_resilience.pyreusing existing helpers and fixtures fromtest_bgp_aggregate_address.py(AggregateCfg,gcu_add_aggregate,gcu_remove_aggregate,verify_bgp_aggregate_consistence,verify_bgp_aggregate_cleanup,dump_db, and thesetup_teardowncheckpoint/rollback fixture).bgp_neighborsfixture to discover BGP neighbor IPs for session-state polling after disruptions.wait_for_aggregate_state()helper to handle the asynchronous bgpcfgd STATE_DB population after disruptions.finallyblocks with graceful fallback to checkpoint rollback.How did you verify/test it?
Ran all test cases on a physical m1-48 testbed with Arista EOS neighbors.

Any platform specific information?
No platform-specific dependencies. Tests use GCU for configuration and standard SONiC reboot/reload utilities, which are platform-agnostic.
Supported testbed topology if it's a new test case?
t1, m1 (declared via
@pytest.mark.topology("t1", "m1"))Documentation
Aligned with BGP-Aggregate-Address test plan